An English to Turkish Machine Translation System Using Structural Mapping
نویسنده
چکیده
This paper describes the design and implementat ion of an English-Turkish machine translation (MT) system developed as a part of the TU-Language project supported by a NATO Science for Stability Project grant. The system uses a structural transfer approach in translating the domain of IBM computer manuals. The general design of the translation system and a detailed description of the transfer component is presented in this paper. 1 I n t r o d u c t i o n The TU-Language project sponsored by the NATO Science for Stability Programme was started in 1994 to establish computational foundations for the natural language processing research on the Turkish language with the collaboration of the Computer Engineering Department of Middle East Technical University, the Computer Science Department of Bilkent University and ttalici Computing, Inc. The project a t tempts to perform extensive research on Turkish which will eventually lead to the development of an English to Turkish machine translation system, Turkish language tutorial system, a Turkish dictionary and other software tools to be used in further research. In this paper, some issues in translating from English to Turkish languages, the translation domain, the outline of the machine translation system under development, and a detailed description of the transfer component will be presented. 2 T u r k i s h L a n g u a g e Morphology and syntax of Turkish are very different from English, therefore, the formalism used to represent English texts has to be altered significantly for Turkish text representation. The Turkish language is characterized as a head final language where the modifier/specifier always precedes the modified/specified. This characteristic also affects the word order of the sentences which can be described as SOV where the verb is positioned at the end. Also, when compared to other languages, Turkish relies more on overt case markings which mark the role of the argument in a sentence. The case markings enables Turkish to have a relatively free wordorder property where every variation in the word order in a sentence results in a different meaning. In the MT system being developed, these and other different characteristics of the Turkish language are handled in the transfer and generation components. 3 T r a n s l a t i o n D o m a i n As more and more computer companies enter the Turkish market, a growing demand for English to Turkish translation of computer manuals has emerged. Other machine translation systems have also chosen the domain of computer manuals for their translation systems because of the relatively unambiguous and narrow sublanguage used (Tsutsumi, 1986). Also, in his research, Nasukawa (Nasukawa, 1993) concluded that the statistical analysis of the text in IBM computer manuals showed that 92.6 percent of the words in a computer manual are used in the same word sense which would significantly reduce the problem of lexical ambiguity resolution. Another advantage is that the material in a computer manual is observed to be written as clearly as possible in a relatively narrow area which will hopefully ease the difficult job of understanding and representing the input sentence. As a result of these observations, the TULanguage project team has chosen the IBM computer manuals as their translation domain.. 4 M a c h i n e T r a n s l a t i o n S y s t e m The English to Turkish MT system under development uses a structural transfer approach which has the following components. First, the English sentence retrieved from the IBM manual is analyzed
منابع مشابه
Twisted Pair Grammar: Support for Rapid Development of Machine Translation for Low Density Languages
An English-to-Turkish Interlingual MT System p. 83 Rapid Prototyping of Domain-Specific Machine Translation Systems p. 95 Time-Constrained Machine Translation p. 103 An Evaluation of the Multi-Engine MT Architecture p. 113 An Ontology-Based Approach to Parsing Turkish Sentences p. 124 Monolingual Translator Workstation p. 136 Fast Document Translation for Cross-Language Information Retrieval p....
متن کاملAn English to Turkish Machine Translation System Using Structural Mapping
This paper describes the design and implementat ion of an English-Turkish machine translation (MT) system developed as a part of the TU-Language project supported by a NATO Science for Stability Project grant. The system uses a structural transfer approach in translating the domain of IBM computer manuals. The general design of the translation system and a detailed description of the transfer c...
متن کاملSyntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only pe...
متن کاملAn English-to-Turkish Interlingual MT System
This paper describes the integration of a Turkish generation system with the KANT knowledge-based machine translation system to produce a prototype English–Turkish interlingua-based machine translation system. These two independently constructed systems were successfully integrated within a period of two months, through development of a module which maps KANT interlingua expressions to Turkish ...
متن کاملAligning Turkish and English Parallel Texts for Statistical Machine Translation
This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment a...
متن کامل